AITopics

2502.20268

Country:

Oceania > Australia (0.28)
Europe > Austria > Vienna (0.14)
Europe > Croatia (0.14)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Koldasbayeva, Diana, Zaytsev, Alexey

Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling

arXiv.org Artificial IntelligenceJan-27-2025

Species Distribution Models (SDMs) often suffer from spatial autocorrelation (SAC), leading to biased performance estimates. We tested cross-validation (CV) strategies - random splits, spatial blocking with varied distances, environmental (ENV) clustering, and a novel spatio-temporal method - under two proposed training schemes: LAST FOLD, widely used in spatial CV at the cost of data loss, and RETRAIN, which maximizes data usage but risks reintroducing SAC. LAST FOLD consistently yielded lower errors and stronger correlations. Spatial blocking at an optimal distance (SP 422) and ENV performed best, achieving Spearman and Pearson correlations of 0.485 and 0.548, respectively, although ENV may be unsuitable for long-term forecasts involving major environmental shifts. A spatio-temporal approach yielded modest benefits in our moderately variable dataset, but may excel with stronger temporal changes. These findings highlight the need to align CV approaches with the spatial and temporal structure of SDM data, ensuring rigorous validation and reliable predictive outcomes.

artificial intelligence, correlation, machine learning, (19 more...)

2502.0348

Country:

Europe > Austria > Vienna (0.14)
Europe > Norway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Yıldız, A. Yarkın, Kalayci, Asli

Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data

arXiv.org Artificial IntelligenceOct-11-2024

Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.

artificial intelligence, deep learning, machine learning, (18 more...)

2410.03705

Country:

North America > United States > Massachusetts > Worcester County > Worcester (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > India (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.71)
Health & Medicine > Therapeutic Area > Oncology (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

#artificialintelligenceOct-31-2022, 03:10:09 GMT

AutoML

This may revolutionize data science: we introduce TabPFN, a new tabular data classification method that takes 1 second & yields SOTA performance (competitive with the best AutoML pipelines in an hour). So far, it is limited in scale, though: it can only tackle problems up to 1000 training examples, 100 features and 10 classes. TabPFN is radically different from previous ML methods. It is a meta-learned algorithm and it provably approximates Bayesian inference with a prior for principles of causality and simplicity. TabPFN happens to be a single transformer, but this is not the usual "trees vs nets" b a t t l e.

approximate bayesian inference, dataset, tabpfn, (14 more...)

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)

Madan, Manav, Jakob, Peter, Schmid-Schirling, Tobias, Valada, Abhinav

Multi-Perspective Anomaly Detection

arXiv.org Artificial IntelligenceMay-20-2021

Multi-view classification is inspired by the behavior of humans, especially when fine-grained features or in our case rarely occurring anomalies are to be detected. Current contributions point to the problem of how high-dimensional data can be fused. In this work, we build upon the deep support vector data description algorithm and address multi-perspective anomaly detection using three different fusion techniques i.e. early fusion, late fusion, and late fusion with multiple decoders. We employ different augmentation techniques with a denoising process to deal with scarce one-class data, which further improves the performance (ROC AUC = 80\%). Furthermore, we introduce the dices dataset that consists of over 2000 grayscale images of falling dices from multiple perspectives, with 5\% of the images containing rare anomalies (e.g. drill holes, sawing, or scratches). We evaluate our approach on the new dices dataset using images from two different perspectives and also benchmark on the standard MNIST dataset. Extensive experiments demonstrate that our proposed approach exceeds the state-of-the-art on both the MNIST and dices datasets. To the best of our knowledge, this is the first work that focuses on addressing multi-perspective anomaly detection in images by jointly using different perspectives together with one single objective function for anomaly detection.

anomaly detection, dataset, detection, (13 more...)

2105.09903

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceNov-16-2020

Graph embeddings via matrix factorization for link prediction: smoothing or truncating negatives?

Agibetov, Asan

Link prediction -- the process of uncovering missing links in a complex network -- is an important problem in information sciences, with applications ranging from social sciences to molecular biology. Recent advances in neural graph embeddings have proposed an end-to-end way of learning latent vector representations of nodes, with successful application in link prediction tasks. Yet, our understanding of the internal mechanisms of such approaches has been rather limited, and only very recently we have witnessed the development of a very compelling connection to the mature matrix factorization theory. In this work, we make an important contribution to our understanding of the interplay between the skip-gram powered neural graph embedding algorithms and the matrix factorization via SVD. In particular, we show that the link prediction accuracy of graph embeddings strongly depends on the transformations of the original graph co-occurrence matrix that they decompose, sometimes resulting in staggering boosts of accuracy performance on link prediction tasks. Our improved approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes yields results on par with the state-of-the-art link prediction performance achieved by a complex neural graph embedding model

factorization, graph, matrix, (15 more...)

2011.09907

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.04)
North America > Puerto Rico (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

#artificialintelligenceAug-6-2020, 20:09:10 GMT

ROC Curve in Machine Learning

The Receiver Operating Characteristic (ROC) curve is a popular tool used with binary classifiers. It is very similar to the precision/recall curve. Still, instead of plotting precision versus recall, the ROC curve plots the true positive rate (another name for recall) against the false positive rate (FPR). The FPR is the ratio of negative instances that are incorrectly classified as positive. It is equal to 1 – the true negative rate (TNR), which is the ratio of negative cases that are correctly classified as negative.

artificial intelligence, classifier, machine learning, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceFeb-11-2020, 19:47:38 GMT

Cost-Sensitive Decision Trees for Imbalanced Classification

The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets. The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. This problem can be overcome by modifying the criterion used to evaluate split points to take the importance of each class into account, referred to generally as the weighted split-point or weighted decision tree. In this tutorial, you will discover the weighted decision tree for imbalanced classification.

dataset, decision tree, decision tree algorithm, (11 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Brereton, Andrew E., MacKinnon, Stephen, Safikhani, Zhaleh, Reeves, Shawn, Alwash, Sana, Shahani, Vijay, Windemuth, Andreas

Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

arXiv.org Machine LearningFeb-11-2020

The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.

artificial intelligence, machine learning, molecule, (18 more...)

arXiv.org Machine Learning

2002.04555

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

#artificialintelligenceJan-30-2020, 08:37:33 GMT

Cost-Sensitive Logistic Regression for Imbalanced Classification

Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighting configuration that is used to influence the amount that logistic regression coefficients are updated during training. The weighting can penalize the model less for errors made on examples from the majority class and penalize the model more for errors made on examples from the minority class. The result is a version of logistic regression that performs better on imbalanced classification tasks, generally referred to as cost-sensitive or weighted logistic regression.

logistic regression, regression, weighting, (13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)